Guest kernel crashes with GPF on volume attach

Bug #2018612 reported by Dan Smith
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Triaged
Undecided
Unassigned
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

This isn't really a bug in nova, but it's something that we're hitting in CI quite a bit, so I'm filing here to record the details and so I can recheck against it. The actual bug is either in the guest (cirros 0.5.2) kernel, QEMU, or something similar. In tests where we attach a volume to a running guest, we occasionally get a guest kernel crash and stack trace that pretty much prevents anything else from working later in the test.

Here's what the trace looks like:

[ 10.152160] virtio_blk virtio2: [vda] 2093056 512-byte logical blocks (1.07 GB/1022 MiB)
[ 10.198313] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 10.199033] GPT:229375 != 2093055
[ 10.199278] GPT:Alternate GPT header not at the end of the disk.
[ 10.199632] GPT:229375 != 2093055
[ 10.199857] GPT: Use GNU Parted to correct GPT errors.
[ 11.291631] random: fast init done
[ 11.312007] random: crng init done
[ 11.419215] general protection fault: 0000 [#1] SMP PTI
[ 11.420843] CPU: 0 PID: 199 Comm: modprobe Not tainted 5.3.0-26-generic #28~18.04.1-Ubuntu
[ 11.421917] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ 11.424732] RIP: 0010:__kmalloc_track_caller+0xa1/0x250
[ 11.425934] Code: 65 49 8b 50 08 65 4c 03 05 b4 48 37 6f 4d 8b 38 4d 85 ff 0f 84 77 01 00 00 41 8b 59 20 49 8b 39 48 8d 4a 01 4c 89 f8 4c 01 fb <48> 33 1b 49 33 99 70 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74 bd
[ 11.428460] RSP: 0018:ffffb524801afaf0 EFLAGS: 00000206
[ 11.429261] RAX: 51f2a72f63305b11 RBX: 51f2a72f63305b11 RCX: 0000000000002b7e
[ 11.430205] RDX: 0000000000002b7d RSI: 0000000000000cc0 RDI: 000000000002f040
[ 11.431123] RBP: ffffb524801afb28 R08: ffff90480762f040 R09: ffff904807001c40
[ 11.432032] R10: ffffb524801afc28 R11: 0000000000000001 R12: 0000000000000cc0
[ 11.432953] R13: 0000000000000004 R14: ffff904807001c40 R15: 51f2a72f63305b11
[ 11.434125] FS: 00007fb31d2486a0(0000) GS:ffff904807600000(0000) knlGS:0000000000000000
[ 11.435139] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.435909] CR2: 0000000000abf9a8 CR3: 00000000027c2000 CR4: 00000000000006f0
[ 11.437208] Call Trace:
[ 11.438716] ? kstrdup_const+0x24/0x30
[ 11.439170] kstrdup+0x31/0x60
[ 11.439668] kstrdup_const+0x24/0x30
[ 11.440036] kvasprintf_const+0x86/0xa0
[ 11.440397] kobject_set_name_vargs+0x23/0x90
[ 11.440791] kobject_set_name+0x49/0x70
[ 11.452382] bus_register+0x80/0x270
[ 11.462448] ? 0xffffffffc033b000
[ 11.471469] hid_init+0x2b/0x62 [hid]
[ 11.480198] do_one_initcall+0x4a/0x1fa
[ 11.487738] ? _cond_resched+0x19/0x40
[ 11.495227] ? kmem_cache_alloc_trace+0x1ff/0x210
[ 11.502700] do_init_module+0x5f/0x227
[ 11.510944] load_module+0x1b96/0x2140
[ 11.517993] __do_sys_finit_module+0xfc/0x120
[ 11.525101] ? __do_sys_finit_module+0xfc/0x120
[ 11.533182] __x64_sys_finit_module+0x1a/0x20
[ 11.542123] do_syscall_64+0x5a/0x130
[ 11.549183] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 11.557921] RIP: 0033:0x7fb31cbaba7d
[ 11.565182] Code: 48 89 57 30 48 8b 04 24 48 89 47 38 e9 79 9e 02 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3a fd ff ff c3 48 c7 c6 01 00 00 00 e9 a1
[ 11.581697] RSP: 002b:00007ffdf6793c18 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[ 11.589245] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb31cbaba7d
[ 11.597913] RDX: 0000000000000000 RSI: 00000000004ab235 RDI: 0000000000000003
[ 11.605694] RBP: 00000000004ab235 R08: 00000000000000c7 R09: 00007fb31cbeba5f
[ 11.613566] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000003
[ 11.620772] R13: 0000000000ab3c70 R14: 0000000000ab3cc0 R15: 0000000000000000
[ 11.628586] Modules linked in: hid(+) virtio_rng virtio_gpu drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_scsi virtio_net net_failover failover virtio_input virtio_blk qemu_fw_cfg 9pnet_virtio 9pnet pcnet32 8139cp mii ne2k_pci 8390 e1000
[ 11.654944] ---[ end trace 9a9e8eebda38a127 ]---
[ 11.663441] RIP: 0010:__kmalloc_track_caller+0xa1/0x250
[ 11.671942] Code: 65 49 8b 50 08 65 4c 03 05 b4 48 37 6f 4d 8b 38 4d 85 ff 0f 84 77 01 00 00 41 8b 59 20 49 8b 39 48 8d 4a 01 4c 89 f8 4c 01 fb <48> 33 1b 49 33 99 70 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 74 bd
[ 11.689167] RSP: 0018:ffffb524801afaf0 EFLAGS: 00000206
[ 11.698903] RAX: 51f2a72f63305b11 RBX: 51f2a72f63305b11 RCX: 0000000000002b7e
[ 11.707107] RDX: 0000000000002b7d RSI: 0000000000000cc0 RDI: 000000000002f040
[ 11.715748] RBP: ffffb524801afb28 R08: ffff90480762f040 R09: ffff904807001c40
[ 11.724372] R10: ffffb524801afc28 R11: 0000000000000001 R12: 0000000000000cc0
[ 11.735147] R13: 0000000000000004 R14: ffff904807001c40 R15: 51f2a72f63305b11
[ 11.747065] FS: 00007fb31d2486a0(0000) GS:ffff904807600000(0000) knlGS:0000000000000000
[ 11.755136] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.763985] CR2: 0000000000abf9a8 CR3: 00000000027c2000 CR4: 00000000000006f0
Segmentation fault

Tags: gate-failure
Dan Smith (danms)
tags: added: gate-failure
Dan Smith (danms)
Changed in nova:
status: New → Triaged
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote (last edit ):

Add some environment details, as we've tagged the kernel component here.

Environment
-----------

This is a nested setup (the default of upstream OpenStack CI): QEMU on KVM (i.e. the level-2 guest is emulated, without KVM acceleration.)

  - Level-0 (bare-metal) host: [We don't know the details of this, as this is provided by the
    cloud provider]
  - Level-1 (L1) guest ("guest hypervisor") kernel: 5.15.0-71-generic
  - [L1] libvirt version: 8.0.0, package: 1ubuntu7.4
  - [L1] QEMU version: 6.2.0Debian 1:6.2+dfsg-2ubuntu6.8
  - Level-2 (L2) guest: CirrOS 0.5.2; and its kernel is the same as the
    associated Ubuntu LTS kernel

Juerg Haefliger (juergh)
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Revision history for this message
Juerg Haefliger (juergh) wrote :

So this is a crash of the guest kernel from a CirrOS 0.5.2 image. That image contains a 5.3 kernel which is no longer supported. The 0.5.2 image is a minor update of the original 0.5.1 image which was built before the release of Focal. It was pulling in the HWE kernel which at that time was a 5.3 kernel. The updated 0.5.2 CirrOS image apparently didn't upgrade the kernel hence it's also on an unsupported 5.3 kernel.

The current (supported) Bionic HWE kernel is 5.4 (same version as the Focal release kernel).

Long story short, you're running on old/unsupported kernel, so the first step is to upgrade the kernel and see if the problem goes away. Also CirrOS is not a supported distro/image, so closing the bug as 'invalid'.

We can revisit if you run into issues with a supported 5.4 kernel. In that case, please open a new ticket.

Changed in linux (Ubuntu):
status: New → Invalid
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Hi, Juerg. Thanks for the diagnosis. I've filed this[1] CirrOS upstream issue.

[1] "Update CirrOS 0.5.2 kernel image to a supported version (kernel-5.4; from Focal) " https://github.com/cirros-dev/cirros/issues/102

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.